Evaluating extracted phrases and extending thesauri
نویسندگان
چکیده
We describe an interface that uses the phrases occurring in a document collection as a basis for browsing the collection and accessing its contents. Phrases are automatically extracted from the document text to represent the subject matter of the collection. Clearly, the interface’s utility depends on how good these phrases are. We evaluate the system by comparing the phrases extracted from a large Web site to those in a thesaurus used by the organization responsible for the site. This analysis serves two purposes: it aids the user by verifying that the phrases extracted are relevant to, and provide good coverage of, the subject areas of the Web site and thesaurus; and it aids the thesaurus compiler by identifying phrases in widespread use that do not appear in the thesaurus.
منابع مشابه
Evaluation of Syntactic Phrase Indexing -- CLARIT NLP Track Report
The CLARIT NLP track e ort is focused on evaluating the usefulness of syntactic phrases for document indexing. The CLARIT system has several NLP techniques integrated with the vector space retrieval model [Evans et al. 91, Evans et al. 95]. The NLP techniques used in CLARIT include morphological analysis, robust noun-phrase parsing, and automatic construction of rst order thesauri, among others...
متن کاملAutomatic Extraction of Cue Phrases for Cross-Corpus Dialogue Act Classification
In this paper, we present an investigation into the use of cue phrases as a basis for dialogue act classification. We define what we mean by cue phrases, and describe how we extract them from a manually labelled corpus of dialogue. We describe one method of evaluating the usefulness of such cue phrases, by applying them directly as a classifier to unseen utterances. Once we have extracted cue p...
متن کاملToward Automatic Compilation of Phrasal Thesaurus
Thesaurus, which links between linguistic expressions (or concepts) based on various semantic relations, is one of the most fundamental semantic resources in a broad range of NLP tasks. A lot of work has been carried out relying on thesauri, such asWordNet (Miller, 1995) and automatically created versions of it. The entries of most existing thesauri are either single words or word sequences inc...
متن کاملThesaurus Extension Using Web Search Engines
Maintaining and extending large thesauri is an important challenge facing digital libraries and IT businesses alike. In this paper we describe a method building on and extending existing methods from the areas of thesaurus maintenance, natural language processing, and machine learning to (a) extract a set of novel candidate concepts from text corpora and (b) to generate a small ranked list of s...
متن کاملTopic-specific Web Searching based on a Real-text Dictionary
The contributions of this paper are twofold. First, we present a new type of dictionary that is intended as a search assistance in topic-specific Web searching. The method to construct the dictionary is a general method that can be applied to any reasonable topic. The first implementation deals with climate change. The dictionary has the following new features compared to standard dictionaries ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000